Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 4 de 4
Filter
Add more filters










Database
Language
Publication year range
1.
bioRxiv ; 2024 Feb 28.
Article in English | MEDLINE | ID: mdl-38464295

ABSTRACT

Deep learning has made rapid advances in modeling molecular sequencing data. Despite achieving high performance on benchmarks, it remains unclear to what extent deep learning models learn general principles and generalize to previously unseen sequences. Benchmarks traditionally interrogate model generalizability by generating metadata based (MB) or sequence-similarity based (SB) train and test splits of input data before assessing model performance. Here, we show that this approach mischaracterizes model generalizability by failing to consider the full spectrum of cross-split overlap, i.e., similarity between train and test splits. We introduce Spectra, a spectral framework for comprehensive model evaluation. For a given model and input data, Spectra plots model performance as a function of decreasing cross-split overlap and reports the area under this curve as a measure of generalizability. We apply Spectra to 18 sequencing datasets with associated phenotypes ranging from antibiotic resistance in tuberculosis to protein-ligand binding to evaluate the generalizability of 19 state-of-the-art deep learning models, including large language models, graph neural networks, diffusion models, and convolutional neural networks. We show that SB and MB splits provide an incomplete assessment of model generalizability. With Spectra, we find as cross-split overlap decreases, deep learning models consistently exhibit a reduction in performance in a task- and model-dependent manner. Although no model consistently achieved the highest performance across all tasks, we show that deep learning models can generalize to previously unseen sequences on specific tasks. Spectra paves the way toward a better understanding of how foundation models generalize in biology.

2.
Elife ; 122023 02 08.
Article in English | MEDLINE | ID: mdl-36752391

ABSTRACT

SARS-CoV-2 has adapted in a stepwise manner, with multiple beneficial mutations accumulating in a rapid succession at origins of VOCs, and the reasons for this are unclear. Here, we searched for coordinated evolution of amino acid sites in the spike protein of SARS-CoV-2. Specifically, we searched for concordantly evolving site pairs (CSPs) for which changes at one site were rapidly followed by changes at the other site in the same lineage. We detected 46 sites which formed 45 CSP. Sites in CSP were closer to each other in the protein structure than random pairs, indicating that concordant evolution has a functional basis. Notably, site pairs carrying lineage defining mutations of the four VOCs that circulated before May 2021 are enriched in CSPs. For the Alpha VOC, the enrichment is detected even if Alpha sequences are removed from analysis, indicating that VOC origin could have been facilitated by positive epistasis. Additionally, we detected nine discordantly evolving pairs of sites where mutations at one site unexpectedly rarely occurred on the background of a specific allele at another site, for example on the background of wild-type D at site 614 (four pairs) or derived Y at site 501 (three pairs). Our findings hint that positive epistasis between accumulating mutations could have delayed the assembly of advantageous combinations of mutations comprising at least some of the VOCs.


Subject(s)
Amino Acids , Evolution, Molecular , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , Alleles , Mutation , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics
3.
Nat Commun ; 12(1): 2751, 2021 05 12.
Article in English | MEDLINE | ID: mdl-33980847

ABSTRACT

Sequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry.


Subject(s)
Alleles , Genome, Human , Regulatory Sequences, Nucleic Acid/genetics , Transcription Factors/metabolism , Chromatin/metabolism , Databases, Genetic , Gene Dosage , Gene Expression Regulation/genetics , Genome-Wide Association Study , Humans , Nucleotide Motifs , Phenotype , Polymorphism, Single Nucleotide , Protein Binding , Quantitative Trait Loci
4.
Int J Mol Sci ; 19(8)2018 Jul 31.
Article in English | MEDLINE | ID: mdl-30065198

ABSTRACT

The cytokines secreted by immune cells have a large impact on the tissue, surrounding a fracture, e.g., by attraction of osteoprogenitor cells. However, the underlying mechanisms are not yet fully understood. Thus, this study aims at investigating molecular mechanisms of the immune cell-mediated migration of immature primary human osteoblasts (phOBs), with transforming growth factor beta (TGF-ß), nicotinamide adenine dinucleotide phosphate (NADPH) oxidase 4 (NOX4) and focal adhesion kinase (FAK) as possible regulators. Monocyte- and macrophage (THP-1 cells ± phorbol 12-myristate 13-acetate (PMA) treatment)-conditioned media, other than the granulocyte-conditioned medium (HL-60 cells + dimethyl sulfoxide (DMSO) treatment), induce migration of phOBs. Monocyte- and macrophage (THP-1 cells)-conditioned media activate Smad3-dependent TGF-ß signaling in the phOBs. Stimulation with TGF-ß promotes migration of phOBs. Furthermore, TGF-ß treatment strongly induces NOX4 expression on both mRNA and protein levels. The associated reactive oxygen species (ROS) accumulation results in phosphorylation (Y397) of FAK. Blocking TGF-ß signaling, NOX4 activity and FAK signaling effectively inhibits the migration of phOBs towards TGF-ß. In summary, our data suggest that monocytic- and macrophage-like cells induce migration of phOBs in a TGF-ß-dependent manner, with TGF-ß-dependent induction of NOX4, associated production of ROS and resulting activation of FAK as key mediators.


Subject(s)
Focal Adhesion Protein-Tyrosine Kinases/metabolism , NADPH Oxidase 4/metabolism , Transforming Growth Factor beta/metabolism , Cell Differentiation/drug effects , Cell Movement/drug effects , Cells, Cultured , HL-60 Cells , Humans , Phosphorylation/drug effects , Reactive Oxygen Species/metabolism , Signal Transduction/drug effects , THP-1 Cells , Tetradecanoylphorbol Acetate/analogs & derivatives , Tetradecanoylphorbol Acetate/pharmacology
SELECTION OF CITATIONS
SEARCH DETAIL
...